Data Mining from Document-append Nosql
نویسندگان
چکیده
Due to the unstructured nature of modern digital data, NoSQL storages have been adopted by some enterprises as the preferred storage facility. NoSQL storages can store schema-oriented, semi-structured, schema-less data. A type of NoSQL storage is the document-append storage (e.g., CouchDB and Mongo) which has received high adoption due to its flexibility to store JSON-based data and files as attachment. However, the ability to perform data mining tasks from such storages remains a challenge and the required tools are generally lacking. Even though there is growing interest in textual data mining, there is huge gap in the engineering solutions that can be applied to document-append storage sources. In this work, we propose a data mining tool for term association detection. The flexibility of our proposed tool is the ability to perform data mining tasks from the document-source directly via HTTP without any copying or formatting of the existing JSON data. We adapt the Kalman filter algorithm to accomplish macro tasks such as topic extraction, term organization, term classification and term clustering. The work is evaluated in comparison with existing textual mining tools such as Apache Mahout and R with promissory result on term extraction accuracy.
منابع مشابه
Multiterm Keyword Searching For Key Value Based NoSQL System
Today, the enterprise landscape faces large amount of data. The information gathered from these data sources are useful for improving on product and services delivery. However, it is challenging to perform searching activities on these data sources because of its unstructured nature Due to unstructured nature of these data, NoSQL storage has been adapted by many enterprises because it provides ...
متن کاملPerformance Evaluation of Analytical Queries on a Standalone and Sharded Document Store
Numerous organizations perform data analytics using relational databases by executing data mining queries. These queries include complex joins and aggregate functions. However, due to an explosion of data in terms of volume, variety, veracity, and velocity known as Big Data [1], many organizations such as Foursquare, Adobe, and Bosch have migrated to NoSQL databases [2] such as MongoDB [3] and ...
متن کاملOnline Mining Changes of Items over Continuous Append-only and Dynamic Data Streams
Online mining changes over data streams has been recognized to be an important task in data mining. Mining changes over data streams is both compelling and challenging. In this paper, we propose a new, single-pass algorithm, called MFC-append (Mining Frequency Changes of append-only data streams), for discovering the frequent frequency-changed items, vibrated frequency changed items, and stable...
متن کاملSingle-Pass Algorithms for Mining Frequency Change Patterns with Limited Space in Evolving Append-Only and Dynamic Transaction Data Streams
In this paper, we propose an online single-pass algorithm MFC-append (Mining Frequency Change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called ChangeSketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modifie...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کامل